类比比例是形式“A的陈述,如C为D”。它们构成了一个推理工具,提供了一个逻辑框架来解决学习,转移和解释性问题,并且在人工智能和自然语言处理中找到有用的应用。在本文中,我们解决了两个问题,即类别,在形态学中的类比检测和分辨率。多种象征方法解决形态学的类比问题,实现竞争性能。我们表明可以使用数据驱动的策略来胜过这些模型。我们提出了一种利用深度学习来检测和解决形态类别的方法。它编码了类似实物比例的结构性,并依赖于专门设计的嵌入模型捕获词语的形态特征。我们展示了模型对多种语言的类比检测和分辨率的竞争性能。我们提供了分析平衡培训数据的影响,并评估我们对输入扰动的鲁棒性的影响。
translated by 谷歌翻译
Artificial intelligence(AI) systems based on deep neural networks (DNNs) and machine learning (ML) algorithms are increasingly used to solve critical problems in bioinformatics, biomedical informatics, and precision medicine. However, complex DNN or ML models that are unavoidably opaque and perceived as black-box methods, may not be able to explain why and how they make certain decisions. Such black-box models are difficult to comprehend not only for targeted users and decision-makers but also for AI developers. Besides, in sensitive areas like healthcare, explainability and accountability are not only desirable properties of AI but also legal requirements -- especially when AI may have significant impacts on human lives. Explainable artificial intelligence (XAI) is an emerging field that aims to mitigate the opaqueness of black-box models and make it possible to interpret how AI systems make their decisions with transparency. An interpretable ML model can explain how it makes predictions and which factors affect the model's outcomes. The majority of state-of-the-art interpretable ML methods have been developed in a domain-agnostic way and originate from computer vision, automated reasoning, or even statistics. Many of these methods cannot be directly applied to bioinformatics problems, without prior customization, extension, and domain adoption. In this paper, we discuss the importance of explainability with a focus on bioinformatics. We analyse and comprehensively overview of model-specific and model-agnostic interpretable ML methods and tools. Via several case studies covering bioimaging, cancer genomics, and biomedical text mining, we show how bioinformatics research could benefit from XAI methods and how they could help improve decision fairness.
translated by 谷歌翻译
With the rise in high resolution remote sensing technologies there has been an explosion in the amount of data available for forest monitoring, and an accompanying growth in artificial intelligence applications to automatically derive forest properties of interest from these datasets. Many studies use their own data at small spatio-temporal scales, and demonstrate an application of an existing or adapted data science method for a particular task. This approach often involves intensive and time-consuming data collection and processing, but generates results restricted to specific ecosystems and sensor types. There is a lack of widespread acknowledgement of how the types and structures of data used affects performance and accuracy of analysis algorithms. To accelerate progress in the field more efficiently, benchmarking datasets upon which methods can be tested and compared are sorely needed. Here, we discuss how lack of standardisation impacts confidence in estimation of key forest properties, and how considerations of data collection need to be accounted for in assessing method performance. We present pragmatic requirements and considerations for the creation of rigorous, useful benchmarking datasets for forest monitoring applications, and discuss how tools from modern data science can improve use of existing data. We list a set of example large-scale datasets that could contribute to benchmarking, and present a vision for how community-driven, representative benchmarking initiatives could benefit the field.
translated by 谷歌翻译
由于它们在建模复杂的问题和处理高维数据集的有效性,因此已显示深神网络(DNN)在广泛的应用领域中的传统机器学习算法优于传统的机器学习算法。但是,许多现实生活数据集具有越来越高的维度,其中大量功能可能与手头的任务无关。包含此类功能不仅会引入不必要的噪声,还会提高计算复杂性。此外,由于许多特征之间的非线性和依赖性高,DNN模型往往不可避免地是不透明的,并且被视为黑盒方法,因为它们的内部功能不佳。解释良好的模型可以识别具有统计学意义的特征,并解释其影响模型结果的方式。在本文中,我们提出了一种有效的方法,可以在高维数据集的情况下提高黑框模型的分类任务。为此,我们首先在高维数据集上训练黑框模型,以了解执行分类的嵌入。为了分解黑框模型的内部工作原理并确定TOP-K重要特征,我们采用了不同的探测和扰动技术。然后,我们通过在TOP-K特征空间上通过可解释的替代模型来近似黑框模型的行为。最后,我们从替代模型中得出决策规则和本地解释,以解释个人决策。当在不同数据集上测试,尺寸在50到20,000之间的不同数据集上进行测试时,我们的方法优于最先进的方法,例如TABNET,XGBOOST和基于Shap的可解释性技术。
translated by 谷歌翻译